perm filename IMPURE[F77,JMC] blob
sn#318040 filedate 1977-11-15 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00007 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 .require "lsp.pub[1,clt]" source_file
C00003 00003 .s IMPURE PROGRAMS AND UNCLEAN PROGRAMS
C00019 00004 .skip 2
C00030 00005 .ss Property Lists
C00035 00006 .ss Input and output.
C00036 00007 .skip to column 1
C00037 ENDMK
C⊗;
.require "lsp.pub[1,clt]" source_file;
.PORTION MAINPORTION
.sname ← SSNAME ← NULL
.NEXT SECTION
.NEXT SECTION
.NEXT SECTION
.s IMPURE PROGRAMS AND UNCLEAN PROGRAMS
Pure clean LISP programs admit the elegant mathematical
characterization described and applied in Chapter 3.
Specifically, each recursively defined pure clean LISP function
can be translated into a functional equation and minimization
schema in a first order language, and the function and schema
can be used to prove that the program meets its extensional
specifications.
In this chapter we shall describe some additional features of
practical LISP systems for which good simple mathematical characterizations
have not yet been found. Every computation that can be done with
these features can be done in pure clean LISP, but nevertheless there
are often good reasons for using these features. We shall examine
the features themselves and also the criteria that determine when
they should be used in preference to pure clean LISP.
All these features can be formalized in terms of their effect
on the "state of the computation", and this has been done by
Michael Gordon (197x), but his formalizations seem too complicated
for elementary exposition. The general idea is to introduce a state
variable %Ax%1 and to describe the "execution" of expressions as
functions on states that produce new states.
.ss "Sequential (Algol-like) programs."
Sequential programs are impure (by definition),
but can be clean - provided
certain restrictions are observed. The external notation for
sequential programs is adapted from that of Algol 60. We allow
as a term an expression of the form
%2program[<variable list>,<statement list>]%1,
where %2<variable list>%1 is a list of variables local to the
program, and %2<statement list>%1 is a list of the statements of
the program. As in Algol 60, the statements are separated by
semicolons, and any statement may be preceded by a label followed
by a colon.
The statements are of the following kinds:
1. Assignment statements of the form
%2<left hand side> ← <right hand side>%1,
where %2<left hand side>%1 is a variable, possibly subscripted, and
%2<right hand side>%1 is a LISP expression that can be evaluated.
2. %3go to%1 statements of the form
%3go to%2 a%1
where %2a%1 is a label or an expression that evaluates to a
label. Since labels are atoms in internal notation, any expression
evaluating to an atom may be used, but the usual case is a conditional
expression wherein the second element of each pair is an atom.
Should the resulting expression not be an atom or should that atom
not be used as a label, an error message will be generated.
3. A conditional statement which has the form
%2qif p1 qthen s1 qelse qif ... qelse qif pn qelse sn%1,
where the %2p%1's are propositional expressions having truth values,
and the %2s%1's are any statements.
4. %3return%2 e%1
where %2e%1 is an arbitrary expression. The effect of executing this
expression is to return from the program giving the program as a term
the value of the expression %2e%1.
As an example, we might write %2reverse%1 as follows:
.nofill
%2reverse u ← %3program%2[[v];
v ← qNIL;
a: qif qn u qthen %3return%2 v;
v ← qa u . v;
u ← qd v;
%3go to%1 a]%1.
.fill
The internal form of the same program is
.nofill
(DE REVERSE (U) (PROG (V)
(SETQ V NIL)
A
(COND ((NULL U) (RETURN V)))
(SETQ V (CONS (CAR U) V))
(SETQ U (CDR U))
(GO A)
))
.fill
where the paragraphing is only for the reader's convenience. Notice that
in internal form, labels are statements rather than attachments to
statements.
Here are some ways we might write %2append%1:
.nofill
%2u * v ← %3program%2[
%3return%2 qif qn u qthen v qelse qa u . [qd u * v]]%1,
.fill
is just a trivial rewrite of the recursive definition.
.nofill
%2u * v ← qprogram[w;
qif qn u qthen qreturn v;
w ← qa u . [qd u * v];
qreturn w]%1
.fill
is almost as close to the pure LISP form. If we want to replace the
recursion by a loop, we can write
.nofill
%2u * v ← qprogram[w, u1, v1;
w ← qNIL;
u1 ← u;
v1 ← v;
a: qif qn u1 qthen qgoto b;
w ← qa u1 . w;
u1 ← qd u;
qgoto a;
b: qif qn w qthen qreturn v1;
v1 ← qa w . v1;
qw ← qd w;
qgoto b]%1.
.fill
This corresponds to the recursive program
.nofill
%2u * v ← app[u,v,qnil]
app[u,v,w] ← qif qn u qthen [qif qn w qthen v qelse app[u,qa w . v,qd w]]
qelse app[qd u,v,qa u . w]%1
.fill
which can save some storage if it is compiled or interpreted without
using the stack.
We shall not treat the mathematics of sequential programs fully at
this point. However, a sequential program that occurs as a term can be
replaced by a term involving only functional programs provided the
following conditions are observed:
1. All variables local to the program are initialized before being
used. Of course, it could be taken as a convention that these variables
are initialized to qNIL when the program is entered. Then it wouldn't
affect the correctness of the translation to a functional program if some
of them weren't further initialized.
2. All assignments are to variables local to the program.
A new function definition is made for each branch (i.e.
conditional %3go_to%1) in the original program. The argument of the
function is a list whose elements are values assigned to the local
variables, i.e. the list has as many elements as there are local
variables. The value of the function is the value that will be
%3return%1ed from the program given that the local variables have the
values given in the argument when the branch is reached. The %3program%1
expression itself is replaced by an application of the function
corresponding to the first branch encountered in the program to the state
vector whose components have the values they will have the first time the
branch is reached written as functions of the external variables.
As an example, consider the following expression that gives the
result of appending the list %2a%1 with the reversal of %2a*b%1:
%2a * %3program%2[[u,v] u ← a * b; v ← qnil; loop: qif qn u
qthen qreturn v; v ← qa u . v; u ← qd u; qgoto loop]%1.
This expression can be replaced by
%2a * loop <u, qnil>%1
and the recursive definition
%2loop z ← qif qn qa z qthen qad z qelse loop <qda z, qaa z . qad z>%1.
Should a program make assignments to external variables, an
ambiguity arises. Suppose we evaluate the expression
%2<x, %3program%2[[] x ← qd x], x>%1.
when ⊗x has the value (A B). If the arguments of the list are "executed"
in right to left order, the expression will have value ((A B) (B) (B)),
whereas if the execution is from right to left, the value will be
((B) (B) (A B)). However, the mathematical semantics of expressions
don't prescribe any order of evaluation, and experience with ALGOL 60,
which prescribed left-to-right evaluation in order to give a definite
meaning to function procedures that alter external variables, has shown
that the quality of compiled code is greatly reduced by prescribing
a fixed order.
This ambiguity can be resolved by rewriting
expressions containing programs that change variables in the expression
using lambda expressions to force a desired order of evaluation.
Thus the above expression could be written
%2[λy:[λz:<z,y,y>][%3program%2[[] x ← qd x]]][x]%1
which would evaluate to ((B) (A B) (A B)) in the above case. Seven other
results could be obtained by putting different combinations of ⊗z and ⊗y between
the pointy brackets.
Once these ambiguities have been resolved, it would seem that we could
translate an outer sequential program containing inner sequential
programs into a function program in a unique way even allowing the
inner programs to make assignments to variables in the outer programs.
The rules for doing this haven't been worked out, however.
An assignment may be regarded as an %3program%1 expression.
Thus %2[x_←_qa_y].z%1 is an abbreviation for
%3program%2[[]x_←_y].z%1.
Exercises:
1. Write a LISP predicate that will determine whether a PROG
in internal form satisfies the above conditions for replacement by
a functional term.
2. Given that a PROG satisfies the conditions, write a LISP
functions to convert the PROG expression and give the a list of the
required auxiliary function definitions.
The reason for discussing how to rewrite the sequential program
as a functional program is to apply the the theory of chapter 3 to
the sequential case. The above conditions permit a purely local rewrite,
but it seems evident that if an outer sequential program containing
an inner one satisfies the initialization and local assignment (i.e.
no side-effect) conditions, then the program can be converted to functional
form even if it contains sequential subprograms that violate these
conditions provided that all assignments in the inner program are at
least local to the outer program. The rules for doing this have not
been formulated as yet.
There are three circumstances in which sequential programs are
preferred to functional programs:
1. In some cases and for some people, it is easier to think
about how to go from the initial conditions step-by-step to the final
result than to think about how the final result can be obtained from
easier cases. There seems to be a matter of taste to a substantial
extent, though the functional form seems to have clear conceptual
advantages for functions like %2append%1 and %2subst%1.
2. When there are a large number of variables, it can turn
out that the necessary functions have an unwieldy number of arguments.
3. When we are considering a program that interacts with
an environment instead of producing an answer, the sequential form
may be required, although this hasn't been proved.
Remarks:
1. Besides (SETQ var exp), most LISPs allow (SET exp1 exp2),
where ⊗exp1 is evaluated to determine to what variable the assignment
is to be made. If ⊗exp1 doesn't evaluate to a variable, an error
is signalled.
2. LISP admits arrays.
.skip 2
.ss Pseudo-functions that modify list structures
In pure LISP, list structure is never changed. ⊗car and ⊗cdr merely
move about in list structure, and ⊗cons creates new list structure
from the free storage list. Of course, a variable can get a new value
that corresponds, for instance, to putting an element on the end of
a list, but in pure LISP this can only be accomplished by creating
new list structure. In this section we will discuss operations that
actually modify list structure. They often have efficiency advantages,
but there use interferes with saving memory by using merging list
structure, and it is techniques for proving correctness of such
programs are not well developed.
The simplest way to express operations that modify list
structure is in the form of assignments to %2car-cdr%1
chains of variables, e.g. we may write
%2qada x ← y.z%1.
The effect of executing this statement is to replace the qada part
of the list structure pointed to by ⊗x with the value of the right
hand side. As an expression, the statement takes the value
assigned.
In internal notation, we use the pseudo-functions (RPLACA_X_Y)
and (RPLACD_X_Y) which correspond to the statements
%2qa x ← y%1
and
%2qd x ← y%1
respectively.
RPLACA and RPLACD are highly unclean. Thus the effect of
evaluating an expression involving RPLACA or RPLACD
depends on the state of list structure and not merely on the S-expressions
the variables have as values.
Suppose, for example, that the variables ⊗x1 and ⊗x2 each have the
value (A B), and consider the effect of (SETQ Z (RPLACA (CDR X) C)).
Z gets the value (C),
and X gets the values (A C), but the new value of Y depends on
whether its copy of (A B) is the same list structure as X's copy.
If yes, the new value of Y is also (A C); otherwise its value
remains (A B).
The uncleanliness of RPLACA and RPLACD means that the programmer
must know exactly what list structures merge. A typical application
is the pseudo-function %2nconc%1 defined by
.nofill
%2nconc[u,v] ← qprogram[[w];
qn u ← qreturn v;
w ← u;
loop: qif qn qd w qthen qbegin qd w ← v; qreturn u qend;
w ← qd w;
qgo loop]%1.
.fill
⊗nconc can also be given a definition that has the form of a recursive
program, namely
%2nconc[u,v] ← nconc1[u,u,v]%1
with
%2nconc[u,w,v] ← qif qn w qthen v qelse qif qn qd w
qthen [λz:u][rplacd[w,v]] qelse nconc[u, qd w, v]%1.
The value of %2nconc[u,v]%1 is just ⊗u*v, but it achieves
this result by modifying the last element of ⊗u to point to ⊗v. It
is typically used when no other variable points to list structure
that merges into ⊗u, and the old value of ⊗u will not be used
again. In that case, ⊗nconc is advantageous, because it does no
%2cons%1es, and thus can't initiate garbage collection. No formal
methods exist at present for proving that these conditions are met.
RPLACD can also be used to insert an element into the
middle of a list. Suppose, for example, that we wish to insert
the atom B after every occurrence of the atom A in a list ⊗u. We
can define a pure LISP function that gives a list obtained from
the list ⊗u by inserting such B's as follows:
%2insertb u ← qif qn u qthen qnil qelse qif qa u = A
qthen A.[B.insertb qd u] qelse qa u . insertb qd u%1.
Computing %2insertb u%1 does not change the value of ⊗u, because
new list structure is manufactured. On the other hand, if we define
.nofill
%2insertb2 u ← qprogram[[w];
w ← u;
loop: qif qn w qthen qreturn u;
qif qa w = A qthen qd w ← B . qd w;
w ← qd w;
qgo loop]%1,
.fill
we get a program that does the actual insertions. It is faster,
because it uses fewer %2cons%1es, but it will damage any list
structure that merges with that representing ⊗u, and it is
mathematically recalcitrant.
.ss RE-ENTRANT LIST STRUCTURE
So far we have only considered list structure in which all
%2car-cdr%1 chains terminate in atoms. Mathematically infinite
list structures are also possible. They can't be represented in
a computer, but since they can satisfy the LISP algebraic axioms, we
have had to exclude them in our proofs by axiom schemata of induction.
Another possibility is the re-entrant list structure. These can
be created by the RPLACA and RPLACD operations. For example, if
the variable ⊗x has value (A B), executing the statement
%2qa x ← x%1
gives rise to a circular structure like that of figure 1a. If we
then execute
%2qd x ← x%1
we get the doubly re-entrant structure of figure 1b. Other possibilities
are shown in the other parts of figure 1.
Re-entrant list structures do not correspond to S-expressions, and
the correctness of most of programs given in this book
depends on the data being non-re-entrant. Re-entrant structures can
be represented by S-expressions provided certain atoms are reserved
for use as labels and a suitable convention is adopted for distinguishing
ordinary atoms from labels and pointers to the labels. For example,
the structure of figure 1b might be represented
by ((LABEL A).((POINT A).(POINT A))).
Programs that compute with re-entrant structures need to keep lists of
vertices they have visited so that they can tell when they are about
to re-enter.
Thus the equivalence of merging list structure can be tested by the
predicate
%2equiv[x,y] ← equiv1[x,y,qnil] ≠ LOSE%1
where
%2equiv1[x,y,u] ← qif x qeq y ∨ match[x,y,u] qthen u
qelse qif u = LOSE ∨ qat x ∨ qat y ∨ unmatch[x,y,u] qthen LOSE
qelse equiv1[qd x,qd y,equiv1[qa x,qa y,[x.y].u]]%1,
%2match[x,y,u] ← ¬qn u ∧ [x.y qequal qa u ∨ match[x,y,qd u]]%1,
and
%2unmatch[x,y,u] ← ¬qn u ∧ [x = qaa u ∨ y = qda u ∨ unmatch[x,y,qd u]]%1.
EXERCISES
.item←0
#. Write a program to list the atoms in a possibly re-entrant
list structure.
#. Write a program to translate a graph description from
the notation of Chapter 1 to a re-entrant list structure.
#. Write functions to translate in each direction between
possibly re-entrant structures and the S-expressions corresponding to
them using the LABEL, POINTER notation described above.
#. Write an axiom schema of induction for finite possibly
re-entrant list structures. Hint: The schema can be baased on an
assertion of convergence of a program that scans a list structure keeping
track of the vertices it has alread visited.
.ss Property Lists
It is necessary to associate at least the following kinds of
information with atoms:
.item ← 0
#. Print names. Programs that print lists or S-expressions in an
external medium must be able to find the string of characters constituting
the %2print name%1 of an atom. Likewise, when an S-expression is read
from an external medium, the read program must be able to determine
whether there is already an atom with a given print name; if so it returns
a pointer to that atom; if not it creates a new atom with the given print
name.
#. Functions associated with atoms. Defining a function
associates the expression defining a functions with the atom
naming it. This is accomplished by putting a pointer to the
expression on the property list of the atom following a suitable flag.
A function defined by a machine language subroutine also has a pointer
to that subroutine on its property list.
#. A variable denoted by a atom has its value stored on the
property list of the atom in many LISP implementations.
Besides these standard uses of property lists, it is convenient
to allow the programmer to associate any information he chooses
with an atom. Such information can be retrieved for a particular atom
in a time that depends on the length of the property list of the atom,
and this is usually quite short. The time is independent of the
number of atoms.
Property lists are usually realized as ordinary lists.
Each "property" has an atom as its name, and the name of the
property is ordinarily followed by a pointer to the information
in question. Because this information can be a string of characters,
a floating point number or a pointer to a subroutine, the property
list is ordinarily not a proper LISP list, and functions that operate
on proper lists can cause errors if applied to property lists.
For example, a program that interpreted a floating point number
as a pair of pointers interpret random places in memory as
list structure.
For this reason, property lists are usually read and manipulated
by special functions. The three important functions are GETPROP
which gets the value (if any) associated with an atom serving as
a property name, PUTPROP which puts a property name and a property
on the property list of a given atom, and REMPROP which removes a
property. The exact form of these functions is implementation
dependent.
From a mathematical point of view PUTPROP acts like an
assignment statement and REMPROP like a special kind of an
assignment statement. GETPROP gets the value of a variable.
However, the mathematical properties of these LISP operations
haven't been systematically studied.
A major use of property lists in programming is in the
creation and modification of networks of data. Values attached to
atoms point to expressions that include other atoms.
.skip 2
.ss Input and output.
.skip to column 1
1. Sequential programs
2. Side effects
3. Property lists
4. Input and output
5. EXPRS, FEXPRS, LEXPRS, SUBRS, FSUBRS, LEXPRS